Conversation
WalkthroughAdds a URL-based RocksDB Cloud S3 configuration (rocksdb_cloud_s3_url / s3_url), a ParseS3Url parser/validator, and logic that makes URL-derived bucket/path/endpoint override legacy bucket/prefix/object_path/endpoint; updates flags, docs, and S3 client construction to use the effective configuration. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Flags as Flags / CLI / Config
participant CFG as RocksDBCloudConfig
participant Parser as ParseS3Url()
participant Store as RocksDB Cloud Store / S3 Client
Flags->>CFG: read rocksdb_cloud_s3_url / legacy fields
alt s3_url provided
CFG->>Parser: ParseS3Url(s3_url)
Parser-->>CFG: S3UrlComponents{valid?, protocol, endpoint, bucket, path}
CFG->>CFG: set s3_url_, derive effective_bucket, effective_path, effective_endpoint\nclear bucket_prefix
CFG-->>Store: provide effective config (URL-based)
else no s3_url
CFG-->>Store: provide legacy config (bucket_name, bucket_prefix, object_path, s3_endpoint)
end
Store->>Store: create S3 client (effective_endpoint)\nproceed with storage operations
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
eloq_data_store_service/rocksdb_cloud_dump.cpp (4)
176-184: Allow default AWS credential provider chain (don’t require keys).parse_arguments throws if keys are missing, but main later supports provider chain. Relax this to improve UX.
- if (params.aws_access_key_id.empty()) - { - throw std::runtime_error("AWS access key ID is required"); - } - if (params.aws_secret_key.empty()) - { - throw std::runtime_error("AWS secret access key is required"); - } + // Credentials may be supplied via env/instance profiles; do not hard-require here.
204-266: HTTPS verification disabled and endpoint lowercased — security and correctness.
- verifySSL=false breaks TLS security.
- Lowercasing the entire endpoint can corrupt case-sensitive paths.
Add a flag and preserve endpoint case; verify HTTPS by default.
+DEFINE_bool(s3_skip_ssl_verify, false, "Skip TLS certificate verification for custom S3 endpoint (not recommended)"); ... - return [endpoint](const std::shared_ptr<Aws::Auth::AWSCredentialsProvider> + return [endpoint](const std::shared_ptr<Aws::Auth::AWSCredentialsProvider> &credentialsProvider, const Aws::Client::ClientConfiguration &baseConfig) -> std::shared_ptr<Aws::S3::S3Client> { - // Check endpoint url start with http or https + // Check endpoint URL scheme if (endpoint.empty()) { return nullptr; } - - std::string endpoint_url = endpoint; - std::transform(endpoint_url.begin(), - endpoint_url.end(), - endpoint_url.begin(), - [](unsigned char c) { return std::tolower(c); }); + const std::string endpoint_lc = [&]{ + std::string t = endpoint; + std::transform(t.begin(), t.end(), t.begin(), [](unsigned char c){ return std::tolower(c); }); + return t; + }(); bool secured_url = false; - if (endpoint_url.rfind("http://", 0) == 0) + if (endpoint_lc.rfind("http://", 0) == 0) { secured_url = false; } - else if (endpoint_url.rfind("https://", 0) == 0) + else if (endpoint_lc.rfind("https://", 0) == 0) { secured_url = true; } else { LOG(ERROR) << "Invalid S3 endpoint URL"; std::abort(); } // Create a new configuration based on the base config Aws::Client::ClientConfiguration config = baseConfig; - config.endpointOverride = endpoint_url; + config.endpointOverride = endpoint; // preserve original case if (secured_url) { config.scheme = Aws::Http::Scheme::HTTPS; } else { config.scheme = Aws::Http::Scheme::HTTP; } - // Disable SSL verification for HTTPS - config.verifySSL = false; + // Verify TLS by default; allow opt-out for dev-only scenarios + config.verifySSL = secured_url && !FLAGS_s3_skip_ssl_verify;
476-484: Bug: status not assigned from ListColumnFamilies result.You check a stale status. Capture the return.
- rocksdb::DBCloud::ListColumnFamilies( - options, params.db_path, &column_families); - if (!status.ok()) + status = rocksdb::DBCloud::ListColumnFamilies( + options, params.db_path, &column_families); + if (!status.ok()) { LOG(ERROR) << "Cannot list column families: " << status.ToString(); throw std::runtime_error("Cannot list column families: " + status.ToString()); }
363-388: Ensure Aws::ShutdownAPI runs on exceptions.Move Aws::SDKOptions outside try and guard Shutdown in catch.
-int main(int argc, char **argv) +int main(int argc, char **argv) { // Initialize Google's logging library google::InitGoogleLogging(argv[0]); FLAGS_logtostderr = 1; // Log to stderr by default // Set up gflags gflags::SetUsageMessage("RocksDB Cloud Dump Utility"); gflags::SetVersionString("1.0.0"); - // Parse command line flags + // Parse command line flags gflags::ParseCommandLineFlags(&argc, &argv, true); + Aws::SDKOptions aws_options; + bool aws_inited = false; try { // Parse command line arguments CmdLineParams params = parse_arguments(); // Create directory for DB path if it doesn't exist std::filesystem::create_directories(params.db_path); - // Initialize AWS SDK - Aws::SDKOptions aws_options; - Aws::InitAPI(aws_options); + // Initialize AWS SDK + Aws::InitAPI(aws_options); + aws_inited = true; ... - Aws::ShutdownAPI(aws_options); + Aws::ShutdownAPI(aws_options); google::ShutdownGoogleLogging(); return 0; } ... - Aws::ShutdownAPI(aws_options); + Aws::ShutdownAPI(aws_options); return 0; } catch (const std::exception &e) { LOG(ERROR) << "Error: " << e.what(); print_usage(argv[0]); + if (aws_inited) Aws::ShutdownAPI(aws_options); google::ShutdownGoogleLogging(); return 1; } // Clean up gflags::ShutDownCommandLineFlags(); google::ShutdownGoogleLogging();Also applies to: 663-669
eloq_data_store_service/rocksdb_cloud_data_store.cpp (3)
219-249: HTTPS verification disabled and endpoint lowercased — fix like dump tool.Preserve endpoint case; verify HTTPS by default.
- std::string endpoint_url = toLower(endpoint); + const std::string endpoint_lc = toLower(endpoint); bool secured_url = false; - if (endpoint_url.rfind("http://", 0) == 0) + if (endpoint_lc.rfind("http://", 0) == 0) { secured_url = false; } - else if (endpoint_url.rfind("https://", 0) == 0) + else if (endpoint_lc.rfind("https://", 0) == 0) { secured_url = true; } else { LOG(ERROR) << "Invalid S3 endpoint url"; std::abort(); } ... - config.endpointOverride = endpoint_url; + config.endpointOverride = endpoint; // preserve original case ... - // Disable SSL verification for HTTPS - config.verifySSL = false; + // Verify TLS when using HTTPS + config.verifySSL = secured_url;
423-431: Manifest scan uses legacy fields; ignore URL overrides.Use the effective bucket/prefix/object_path already applied to cfs_options_.
- bool ret = FindMaxTermFromCloudManifestFiles(storage_provider, - cloud_config_.bucket_prefix_, - cloud_config_.bucket_name_, - cloud_config_.object_path_, + bool ret = FindMaxTermFromCloudManifestFiles(storage_provider, + cfs_options_.src_bucket.GetBucketPrefix(), + cfs_options_.src_bucket.GetBucketName(), + cfs_options_.src_bucket.GetObjectPath(), cloud_config_.branch_name_, dss_shard_id, cloud_manifest_prefix, max_term);
639-646: Purger listener bucket uses legacy fields; ignore URL overrides.Derive from the effective cfs_options passed to OpenCloudDB.
- std::string bucket_name = - cloud_config_.bucket_prefix_ + cloud_config_.bucket_name_; + std::string bucket_name = cfs_options.src_bucket.GetBucketPrefix() + + cfs_options.src_bucket.GetBucketName(); ... - cloud_config_.object_path_, + cfs_options.src_bucket.GetObjectPath(),
🧹 Nitpick comments (11)
eloq_data_store_service/README.md (3)
145-153: Call out prefix removal and object_path requirement inline.Add a short note under the URL example that bucket_prefix is ignored with URL config and the object_path segment is mandatory (URLs like s3://bucket only will fail validation). This prevents migration surprises.
Apply:
rocksdb_cloud_s3_url = s3://my-eloqdata-bucket/rocksdb_cloud +; Note: bucket_prefix is not supported with URL config. If you previously used a prefix, +; fold it into the bucket name (e.g., s3://prod-my-eloqdata-bucket/rocksdb_cloud). +; The object_path segment (after the bucket) is required.
155-163: MinIO example: add HTTPS and cert note.Provide an HTTPS variant and a tip that custom CAs/self‑signed certs require proper trust configuration rather than disabling verification in code.
Apply:
rocksdb_cloud_s3_url = http://localhost:9000/my-bucket/rocksdb_data +; or, with TLS: +; rocksdb_cloud_s3_url = https://minio.local:9000/my-bucket/rocksdb_data +; Ensure the TLS certificate is trusted by the host (install CA if self-signed).
286-300: Document strict URL parsing.Explicitly note that:
- object_path cannot be empty (trailing slash like s3://bucket/ is invalid),
- no leading slash before bucket (s3:///bucket/... is invalid).
Add to Important Notes:
5. **Validation:** Invalid URLs will cause the application to fail at startup with a descriptive error message. +6. **URL shape:** The path segment after the bucket is required (no trailing slash-only paths). Examples: + - Valid: `s3://bucket/data`, `http://host:9000/bucket/data` + - Invalid: `s3://bucket/` (missing object_path), `s3:///bucket/data` (extra slash)eloq_data_store_service/RocksDB_Configuration_Flags.md (2)
101-106: Clarify object_path is mandatory and no trailing slash.State explicitly that the segment after the bucket must be non-empty;
s3://bucket/is invalid and will be rejected.Apply:
**URL Format Specification:** -- S3: `s3://{bucket_name}/{object_path}` +- S3: `s3://{bucket_name}/{object_path}` (object_path is required; no trailing slash-only paths) - HTTP/HTTPS: `http(s)://{host}:{port}/{bucket_name}/{object_path}`
116-117: Precedence note is good. Add bucket_prefix reminder.Reiterate here that bucket_prefix is ignored when URL is used to avoid misconfiguration.
Apply:
**Note:** If both `rocksdb_cloud_s3_url` and legacy options are provided, the URL-based configuration takes precedence and overrides the legacy settings. +When URL-based config is active, any `rocksdb_cloud_bucket_prefix` value is ignored.eloq_data_store_service/rocksdb_config.h (1)
118-120: noexcept on trivial accessor.Mark IsS3UrlConfigured as noexcept.
- bool IsS3UrlConfigured() const { return !s3_url_.empty(); } + bool IsS3UrlConfigured() const noexcept { return !s3_url_.empty(); }eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
72-96: Update usage to include s3_url and deprecations.Help users discover the new flag and legacy status.
- "--bucket_name=BUCKET --bucket_prefix=PREFIX [options]\n" + "[--s3_url=s3://BUCKET/PATH | --bucket_name=BUCKET --bucket_prefix=PREFIX] [options]\n" ... - " --bucket_name=BUCKET S3 bucket name\n" - " --bucket_prefix=PREFIX S3 bucket prefix\n" - " --object_path=PATH S3 object path for RocksDB Cloud " + " --s3_url=URL URL form (recommended): s3://bucket/path or http(s)://host:port/bucket/path\n" + " --bucket_name=BUCKET Legacy: S3 bucket name\n" + " --bucket_prefix=PREFIX Legacy: S3 bucket prefix (ignored with --s3_url)\n" + " --object_path=PATH Legacy: S3 object path for RocksDB Cloud " + " --s3_endpoint=URL Legacy: Custom S3 endpoint URL (use URL host in --s3_url instead)\n"eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
378-386: Transfer Manager toggle is endpoint-driven; log rationale.Optional: add a short log when disabling transfer manager due to custom endpoint to aid ops.
if (!effective_endpoint_url.empty()) { cfs_options_.s3_client_factory = BuildS3ClientFactory(effective_endpoint_url); // Intermittent and unpredictable IOError happend from time to // time when using aws transfer manager with minio. Disable aws // transfer manager if endpoint is set (minio). cfs_options_.use_aws_transfer_manager = false; + LOG(INFO) << "Custom S3 endpoint set; disabling AWS Transfer Manager."; }eloq_data_store_service/rocksdb_config.cpp (3)
296-304: Clarify URL style and consider guarding the flag.
- Help text implies path-style addressing only; explicitly say virtual-hosted-style (e.g., https://bucket.s3.amazonaws.com/path) is not supported to avoid confusion.
- For consistency with other cloud flags, consider wrapping this DEFINE under the same S3/GCS build guard to hide it from non-cloud builds.
Apply this doc tweak (optional):
DEFINE_string(rocksdb_cloud_s3_url, "", - "RocksDB cloud S3 URL. Format: s3://{bucket}/{path} or " - "http(s)://{host}:{port}/{bucket}/{path}. " + "RocksDB cloud S3 URL. Supported formats (path‑style only): " + "s3://{bucket}/{path} or http(s)://{host}[:port]/{bucket}/{path}. " + "Virtual‑hosted style (e.g., https://{bucket}.s3.{region}.amazonaws.com/{path}) is not supported. " "Examples: s3://my-bucket/my-path, " "http://localhost:9000/my-bucket/my-path. " "This option takes precedence over legacy configuration options " "if both are provided");Optionally guard the flag:
+#if (defined(DATA_STORE_TYPE_ELOQDSS_ROCKSDB_CLOUD_S3) || \ + defined(DATA_STORE_TYPE_ELOQDSS_ROCKSDB_CLOUD_GCS)) DEFINE_string(rocksdb_cloud_s3_url, ... "if both are provided"); +#endif
307-399: Make URL parsing more robust (scheme normalization, slash handling, minor shadowing).Parser works for the documented formats but can be hardened:
- Normalize scheme to lowercase to accept “S3/HTTP/HTTPS”.
- Trim redundant leading slashes after the endpoint split; strip trailing slashes from object_path.
- Avoid reusing the name first_slash in two scopes (small readability nit).
Apply minimally invasive changes:
S3UrlComponents ParseS3Url(const std::string &s3_url) { S3UrlComponents result; @@ - result.protocol = s3_url.substr(0, protocol_end); + result.protocol = s3_url.substr(0, protocol_end); + // Normalize protocol to lowercase + std::transform(result.protocol.begin(), + result.protocol.end(), + result.protocol.begin(), + [](unsigned char c) { return static_cast<char>(std::tolower(c)); }); @@ - if (result.protocol == "http" || result.protocol == "https") + if (result.protocol == "http" || result.protocol == "https") { // Find the first slash after the host:port - size_t first_slash = remaining.find('/'); - if (first_slash == std::string::npos) + size_t endpoint_sep = remaining.find('/'); + if (endpoint_sep == std::string::npos) { result.error_message = "Invalid S3 URL format: missing bucket and object path"; return result; } // Store the full endpoint URL including protocol - result.endpoint_url = - result.protocol + "://" + remaining.substr(0, first_slash); - remaining = remaining.substr(first_slash + 1); + result.endpoint_url = + result.protocol + "://" + remaining.substr(0, endpoint_sep); + remaining = remaining.substr(endpoint_sep + 1); + // Remove any leading slashes before bucket + while (!remaining.empty() && remaining.front() == '/') + { + remaining.erase(0, 1); + } } @@ - size_t first_slash = remaining.find('/'); - if (first_slash == std::string::npos) + size_t key_sep = remaining.find('/'); + if (key_sep == std::string::npos) { result.error_message = "Invalid S3 URL format: missing object path (format: " "{bucket_name}/{object_path})"; return result; } - result.bucket_name = remaining.substr(0, first_slash); - result.object_path = remaining.substr(first_slash + 1); + result.bucket_name = remaining.substr(0, key_sep); + result.object_path = remaining.substr(key_sep + 1); + // Normalize object_path: strip trailing slashes + while (!result.object_path.empty() && result.object_path.back() == '/') + { + result.object_path.pop_back(); + }Note: you’ll need
#include <algorithm>and#include <cctype>at the top.Please confirm
S3UrlComponentsdefault-initializesis_validto false; otherwise setresult.is_valid = false;on construction paths.
613-619: Precedence is read; enforce it and surface conflicts.You read s3_url_ first (good). When both URL and legacy fields are set, ensure downstream code:
- Ignores legacy bucket_prefix_/bucket_name_/object_path_ and logs a clear warning, or
- Fails fast if both are present.
If not already enforced elsewhere, add a post-load check in this ctor after legacy fields are read:
if (!s3_url_.empty()) { if (!bucket_prefix_.empty()) { LOG(WARNING) << "rocksdb_cloud_bucket_prefix is ignored when rocksdb_cloud_s3_url is set."; bucket_prefix_.clear(); } // Consider warning for bucket_name_/object_path_ too, since the URL is the source of truth. }Would you like a follow-up patch plus unit tests covering precedence and logging?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
eloq_data_store_service/README.md(3 hunks)eloq_data_store_service/RocksDB_Configuration_Flags.md(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(2 hunks)eloq_data_store_service/rocksdb_cloud_dump.cpp(2 hunks)eloq_data_store_service/rocksdb_config.cpp(4 hunks)eloq_data_store_service/rocksdb_config.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
eloq_data_store_service/rocksdb_config.h (1)
eloq_data_store_service/rocksdb_config.cpp (2)
ParseS3Url(310-397)ParseS3Url(310-310)
eloq_data_store_service/rocksdb_config.cpp (1)
eloq_data_store_service/main.cpp (2)
CheckCommandLineFlagIsDefault(90-102)CheckCommandLineFlagIsDefault(90-90)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
eloq_data_store_service/rocksdb_config.cpp (2)
ParseS3Url(310-397)ParseS3Url(310-310)
eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
eloq_data_store_service/rocksdb_config.cpp (2)
ParseS3Url(310-397)ParseS3Url(310-310)
🔇 Additional comments (3)
eloq_data_store_service/rocksdb_config.h (1)
74-83: S3UrlComponents looks good.Structure and defaults are sensible.
eloq_data_store_service/rocksdb_config.cpp (2)
275-275: LGTM: default-on purger flag.Enabling the purger by default is reasonable and matches the constructor wiring. No issues here.
677-677: Overflow concern properly addressed.The code correctly implements the fix. At lines 677–678,
rocksdb_cloud_purger_periodicity_secsis declared asuint64_t, converting the uint32 flag value before the multiplication by 1000 at line 690. This avoids overflow. The target fieldpurger_periodicity_millis_issize_t, which is 64-bit on modern systems.
56eb83c to
cefcd85
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
422-430: Fix inconsistent bucket configuration usage in FindMaxTermFromCloudManifestFiles and PurgerEventListener.Lines 423-430 and 640 use
cloud_config_.bucket_prefix_,cloud_config_.bucket_name_, andcloud_config_.object_path_directly, but these should use the effective configuration values computed at lines 301-331. When S3 URL-based configuration is active,effective_bucket_prefixis explicitly set to empty (line 324), whereascloud_config_.bucket_prefix_retains the legacy value. This causes FindMaxTermFromCloudManifestFiles to search with incorrect paths and PurgerEventListener to use wrong bucket names when URL config is active.Use the effective configuration variables:
- Replace
cloud_config_.bucket_prefix_witheffective_bucket_prefix- Replace
cloud_config_.bucket_name_witheffective_bucket_name- Replace
cloud_config_.object_path_witheffective_object_path
🧹 Nitpick comments (1)
eloq_data_store_service/rocksdb_config.h (1)
92-177: Use distinct variable names for improved clarity.The variable
first_slashis declared at line 136 (inside the http/https block) and again at line 152 (after the block). While technically in different scopes, consider renaming toendpoint_slashandbucket_slashrespectively for better readability.Regarding other edge cases: Trailing slashes in URLs like
s3://bucket/path/will result inobject_path="path/"with the trailing slash preserved. This appears to be expected behavior. URL-encoded characters (e.g.,%20) don't appear to be supported by this parser, which aligns with the codebase design—URL decoding is expected to be handled by the caller before passing URLs toParseS3Url.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
eloq_data_store_service/README.md(3 hunks)eloq_data_store_service/RocksDB_Configuration_Flags.md(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(4 hunks)eloq_data_store_service/rocksdb_cloud_dump.cpp(2 hunks)eloq_data_store_service/rocksdb_config.cpp(4 hunks)eloq_data_store_service/rocksdb_config.h(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- eloq_data_store_service/README.md
- eloq_data_store_service/rocksdb_cloud_dump.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
eloq_data_store_service/rocksdb_config.h (1)
ParseS3Url(92-177)
eloq_data_store_service/rocksdb_config.cpp (1)
eloq_data_store_service/main.cpp (2)
CheckCommandLineFlagIsDefault(90-102)CheckCommandLineFlagIsDefault(90-90)
🔇 Additional comments (9)
eloq_data_store_service/rocksdb_config.h (2)
74-82: LGTM! Well-structured result type.The
S3UrlComponentsstruct provides a clear way to return parsed URL components with validation state and error messaging.
197-204: LGTM! Clean integration with existing config structure.The new
s3_url_member andIsS3UrlConfigured()accessor follow the existing patterns inRocksDBCloudConfig.eloq_data_store_service/RocksDB_Configuration_Flags.md (1)
90-117: LGTM! Clear documentation with good migration guidance.The documentation effectively communicates:
- The new URL-based configuration approach
- Supported URL formats with examples
- Deprecation of legacy options
- Precedence rules when both configurations are present
This will help users understand the new feature and migrate from legacy configuration.
eloq_data_store_service/rocksdb_cloud_data_store.cpp (4)
299-334: LGTM! Clean precedence logic with good validation.The effective configuration determination correctly:
- Starts with legacy configuration as defaults
- Parses and validates the S3 URL when provided
- Overrides legacy values with URL-derived values (including setting bucket_prefix to empty)
- Logs the effective configuration for debugging
The error handling with
LOG(FATAL)and detailed error messages will help users quickly identify and fix invalid URL formats.
336-343: LGTM! Correct application of effective configuration.The code properly applies the effective configuration values to both source and destination buckets in
cfs_options_. The region remains fromcloud_config_.region_as expected, since it's not part of the URL-based configuration.
378-392: LGTM! Proper endpoint handling with AWS transfer manager logic.The code correctly:
- Uses
effective_endpoint_urlwhich supports both URL-based and legacy configuration- Builds a custom S3 client factory when an endpoint is specified
- Disables AWS transfer manager for custom endpoints (MinIO compatibility)
- Enables AWS transfer manager for standard AWS S3
639-645: Inconsistent use of bucket configuration in PurgerEventListener.Similar to the issue at lines 424-430, this code concatenates
cloud_config_.bucket_prefix_withcloud_config_.bucket_name_when initializingPurgerEventListener. When URL-based configuration is used:
- The effective bucket prefix should be empty (per line 321)
- The effective bucket name comes from the URL
Consider using effective configuration values here as well to maintain consistency with the URL-based configuration paradigm.
eloq_data_store_service/rocksdb_config.cpp (2)
296-303: LGTM! Well-documented flag definition.The
rocksdb_cloud_s3_urlflag is well-defined with:
- Clear format specification
- Practical examples
- Explicit precedence documentation
- Appropriate default value (empty string for opt-in behavior)
519-525: LGTM! Consistent precedence handling.The S3 URL configuration initialization correctly implements the precedence order (command line > config file > default) and follows the exact pattern used for all other configuration flags in this constructor.
Add s3_url for rocksdb_cloud_dump Move parse s3_url into rocksdb_cloud_data_store.cpp Enable verifySSL for S3 endpoint Move ParseS3Url
70ef679 to
0dd505b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
422-430: Fix inconsistent configuration usage: FindMaxTermFromCloudManifestFiles should use effective config values, not legacy config.The
FindMaxTermFromCloudManifestFilescall on lines 423-430 usescloud_config_.bucket_prefix_,cloud_config_.bucket_name_, andcloud_config_.object_path_directly from the original config instead of the computed effective values. This causes incorrect behavior when URL-based configuration is used, as the URL-based config setseffective_bucket_prefix = ""(line 321) whileFindMaxTermFromCloudManifestFileswould still use the legacycloud_config_.bucket_prefix_value.Update the call to use effective configuration values for consistency with the cfs_options initialization (lines 336-343):
bool ret = FindMaxTermFromCloudManifestFiles(storage_provider, - cloud_config_.bucket_prefix_, - cloud_config_.bucket_name_, - cloud_config_.object_path_, + effective_bucket_prefix, + effective_bucket_name, + effective_object_path, cloud_config_.branch_name_, dss_shard_id, cloud_manifest_prefix, max_term);Additionally, lines 639-645 in the event listener initialization also use legacy config directly (bucket_name and object_path). These should similarly be updated to use effective configuration values.
🧹 Nitpick comments (2)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
299-334: Verify LOG(FATAL) behavior and improve error message.The URL parsing and override logic is well-implemented with clear logging. However:
LOG(FATAL) terminates the process immediately - this is appropriate for invalid configuration at startup, but ensure this is the intended behavior for your deployment model.
Error message could be more actionable: Consider adding a suggestion about which configuration to check or remove:
LOG(FATAL) << "Invalid rocksdb_cloud_s3_url: " << url_components.error_message << ". URL format: s3://{bucket}/{path} or " "http(s)://{host}:{port}/{bucket}/{path}. " << "Examples: s3://my-bucket/my-path, " - << "http://localhost:9000/my-bucket/my-path"; + << "http://localhost:9000/my-bucket/my-path. " + << "Please correct the rocksdb_cloud_s3_url configuration or remove it to use legacy configuration.";eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
52-53: Fix inconsistent flag description format.The
bucket_prefixflag description on line 53 should match the format of other legacy flags for consistency:-DEFINE_string(bucket_prefix, "", "S3 bucket prefix (legacy, not supported with s3_url)"); +DEFINE_string(bucket_prefix, "", "S3 bucket prefix (legacy, use s3_url instead)");The phrase "not supported with s3_url" might be confusing - it's better to consistently use "use s3_url instead" like the other flags.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
eloq_data_store_service/README.md(3 hunks)eloq_data_store_service/RocksDB_Configuration_Flags.md(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(4 hunks)eloq_data_store_service/rocksdb_cloud_dump.cpp(2 hunks)eloq_data_store_service/rocksdb_config.cpp(4 hunks)eloq_data_store_service/rocksdb_config.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
eloq_data_store_service/rocksdb_config.cpp (2)
eloq_data_store_service/main.cpp (2)
CheckCommandLineFlagIsDefault(90-102)CheckCommandLineFlagIsDefault(90-90)rocksdb_config.cpp (2)
CheckCommandLineFlagIsDefault(298-310)CheckCommandLineFlagIsDefault(298-298)
eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
eloq_data_store_service/rocksdb_config.h (1)
ParseS3Url(92-177)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
eloq_data_store_service/rocksdb_config.h (1)
ParseS3Url(92-177)
🔇 Additional comments (6)
eloq_data_store_service/README.md (1)
199-300: LGTM! Comprehensive migration guide.The migration guide is well-structured and provides clear examples for different use cases (AWS S3, MinIO, HTTPS S3-compatible storage). The precedence rules and backward compatibility notes are helpful for users migrating from legacy configuration.
eloq_data_store_service/RocksDB_Configuration_Flags.md (1)
90-117: LGTM! Clear documentation of URL-based configuration.The documentation clearly explains the new URL format, provides examples, and marks legacy configuration as deprecated. The precedence rules are explicit and helpful.
eloq_data_store_service/rocksdb_config.h (1)
74-82: LGTM! Well-designed URL components struct.The struct provides all necessary fields for storing parsed URL components with proper validation state and error reporting.
eloq_data_store_service/rocksdb_config.cpp (2)
296-303: LGTM! Well-documented flag definition.The flag definition includes comprehensive description with format examples and precedence notes. This will help users understand how to use the new URL-based configuration.
519-525: LGTM! Correct precedence implementation.The s3_url_ initialization follows the established pattern of command-line flags taking precedence over configuration file values. This is consistent with the rest of the configuration system.
eloq_data_store_service/rocksdb_cloud_dump.cpp (1)
125-154: LGTM! Consistent URL parsing implementation.The URL parsing logic in the dump utility correctly mirrors the main data store implementation with appropriate error handling and logging.
| inline S3UrlComponents ParseS3Url(const std::string &s3_url) | ||
| { | ||
| S3UrlComponents result; | ||
|
|
||
| if (s3_url.empty()) | ||
| { | ||
| result.error_message = "S3 URL is empty"; | ||
| return result; | ||
| } | ||
|
|
||
| // Find protocol separator | ||
| size_t protocol_end = s3_url.find("://"); | ||
| if (protocol_end == std::string::npos) | ||
| { | ||
| result.error_message = "Invalid S3 URL format: missing '://' separator"; | ||
| return result; | ||
| } | ||
|
|
||
| result.protocol = s3_url.substr(0, protocol_end); | ||
|
|
||
| // Validate protocol | ||
| if (result.protocol != "s3" && result.protocol != "http" && | ||
| result.protocol != "https") | ||
| { | ||
| result.error_message = "Invalid protocol '" + result.protocol + | ||
| "'. Must be one of: s3, http, https"; | ||
| return result; | ||
| } | ||
|
|
||
| // Extract the part after protocol | ||
| size_t path_start = protocol_end + 3; // Skip "://" | ||
| if (path_start >= s3_url.length()) | ||
| { | ||
| result.error_message = "Invalid S3 URL format: no content after protocol"; | ||
| return result; | ||
| } | ||
|
|
||
| std::string remaining = s3_url.substr(path_start); | ||
|
|
||
| // For http/https, extract the endpoint (host:port) and then the bucket/path | ||
| // Format: http(s)://{host}:{port}/{bucket_name}/{object_path} | ||
| if (result.protocol == "http" || result.protocol == "https") | ||
| { | ||
| // Find the first slash after the host:port | ||
| size_t first_slash = remaining.find('/'); | ||
| if (first_slash == std::string::npos) | ||
| { | ||
| result.error_message = | ||
| "Invalid S3 URL format: missing bucket and object path"; | ||
| return result; | ||
| } | ||
|
|
||
| // Store the full endpoint URL including protocol | ||
| result.endpoint_url = result.protocol + "://" + | ||
| remaining.substr(0, first_slash); | ||
| remaining = remaining.substr(first_slash + 1); | ||
| } | ||
|
|
||
| // Now extract bucket_name and object_path from remaining | ||
| // Format: {bucket_name}/{object_path} | ||
| size_t first_slash = remaining.find('/'); | ||
| if (first_slash == std::string::npos) | ||
| { | ||
| result.error_message = | ||
| "Invalid S3 URL format: missing object path (format: {bucket_name}/{object_path})"; | ||
| return result; | ||
| } | ||
|
|
||
| result.bucket_name = remaining.substr(0, first_slash); | ||
| result.object_path = remaining.substr(first_slash + 1); | ||
|
|
||
| if (result.bucket_name.empty()) | ||
| { | ||
| result.error_message = "Bucket name cannot be empty"; | ||
| return result; | ||
| } | ||
|
|
||
| if (result.object_path.empty()) | ||
| { | ||
| result.error_message = "Object path cannot be empty"; | ||
| return result; | ||
| } | ||
|
|
||
| result.is_valid = true; | ||
| return result; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "rocksdb_config.h" -type fRepository: eloqdata/store_handler
Length of output: 127
🏁 Script executed:
cat -n ./eloq_data_store_service/rocksdb_config.h | head -200Repository: eloqdata/store_handler
Length of output: 8065
🏁 Script executed:
rg "ParseS3Url" -A 5 -B 5 --type cpp --type hRepository: eloqdata/store_handler
Length of output: 3809
🏁 Script executed:
# Check if there are any tests for ParseS3Url
find . -name "*test*.cpp" -o -name "*test*.h" | xargs grep -l "ParseS3Url" 2>/dev/nullRepository: eloqdata/store_handler
Length of output: 50
Add validation for edge cases in URL parsing.
The ParseS3Url function has valid validation logic but misses several edge cases:
-
Trailing slashes: URLs like
s3://bucket/path/will include the trailing slash inobject_path, which may be inconsistent with expected behavior. -
Double slashes: Handling is inconsistent—
s3://bucket//pathis accepted (storing/pathas object_path), whilehttp://host:9000//bucket/pathis rejected due to empty bucket_name. Consider normalizing or consistently rejecting such URLs. -
Port validation: For http/https URLs, the endpoint URL is stored as-is without validating that the port (if present) is numeric. For example,
http://host:invalid/bucket/pathwould be accepted withendpoint_url = "http://host:invalid", potentially causing downstream failures. -
URL encoding: The parser does not decode URL-encoded characters (e.g.,
%20), though this may be acceptable depending on whether URLs are expected to be pre-decoded from configuration sources.
Consider adding explicit validation for numeric ports and documenting expected behavior for URLs with trailing/multiple slashes.
Summary by CodeRabbit
Documentation
New Features
✏️ Tip: You can customize this high-level summary in your review settings.